load libraries
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)This document created to make easier understanding data that we are working.
In this document we will observe some tables:
ds_year (describes the number of titles and the number of copies for each year, since 2018)
ds_genre (describes the number of titles and the number of copies for each year, by genre)
ds_georaphy (describes the number of titles and the number of copies for each year, by region)
ds_language (describes the number of titles and the number of copies for each year, by language)
ds_pubtype (describes the number of titles and the number of copies for each year, by publication type)
ds_ukr_rus (describes the number of titles and the number of copies, and the percentage of Ukrainian and Russian books for each year)
P.s: All tables contain information for 2005–2024, except for ds_language
Load the libraries used in this report.
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)We will be working with data that describes two main measures: the number of titles (title_count) and the number of copies (copy_count). Several specific tables provide these measures under different conditions.
getwd()
ds_genre <- readRDS("../../data-private/derived/manipulation/ds_genre.rds") # load ds_genre
ds_year <- readRDS("../../data-private/derived/manipulation/ds_year.rds") # load ds_year
ds_language <- readRDS("../../data-private/derived/manipulation/ds_language.rds") # load ds_language
ds_geography <- readRDS("../../data-private/derived/manipulation/ds_geography.rds") # load ds_geography
ds_pubtype <- readRDS("../../data-private/derived/manipulation/ds_pubtype.rds") # load ds_pubtype
ds_ukr_rus <- readRDS("../../data-private/derived/manipulation/ds_ukr_rus.rds") # load ds_ukr_rusWe will create a short visualization for each table to make the data easier to understand. You will see several graphs that describe the main features of the data.
This table has 40 rows and 3 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). The value column contains the numeric data for both measures.
The first graph shows the number of copies for each year.
years_all <- seq(min(ds_year$yr), max(ds_year$yr))
g_year_copy <- ds_year %>%
filter(measure == "copy_count") %>%
ggplot(aes(x = yr, y = value)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = years_all) +
scale_y_continuous(breaks = seq(0, 80000, by = 10000), limits = c(0, 80000)) +
labs(
title = "Number of Copies by Year"
, x = "The year"
, y = "Number of copies (ths.)"
)+
theme_minimal() +
theme(
panel.grid.minor = element_blank()
,panel.border = element_rect(color = "black", fill = NA, size = 1)
)
g_year_copy_plotly <- ggplotly(g_year_copy)
g_year_copy_plotlyThe second graph observe the number of titles for each year.
g_year_title <- ds_year %>%
filter(
measure == "title_count"
) %>%
ggplot(aes(x = yr, y = value)) +
geom_point() +
geom_line() +
scale_x_continuous(breaks = years_all) +
scale_y_continuous(breaks = seq(0, 27500, by = 5000), limits = c(7500, 27500)) +
theme_minimal() +
labs(
title = "Number of Titles by Year"
, x = "The year"
, y = "Number of titles"
) +
theme(panel.grid.minor = element_blank()
,panel.border = element_rect(color = "black", fill = NA, size = 1))
g_year_title_plotly <- ggplotly(g_year_title)
g_year_title_plotlyrm(g_year_title, g_year_copy, years_all)What can we observe?
The most productive year for new titles and the year with the most copies
The overall production trend
This table has 40 rows and 15 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). Starting from the third column, each column represents a separate genre.
In the first graph, we see the number of copies for each genre, by year.
ds_genre_copy <-
ds_genre %>%
filter(
measure == "copy_count"
) %>%
group_by(yr) %>%
pivot_longer(
cols = -c(yr, measure), # adding genre column
names_to = "genre",
values_to = "value"
)
g_genre_copy<- ds_genre_copy %>%
ggplot(aes(x = genre, y = value, fill = genre)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 30000, by = 7500), limits = c(0, 37000)) +
labs(
title = "Number of Copies by Genre and Year",
x = "Genre",
y = "Number of Copies (ths.)",
fill = "Genre"
) +
theme_get() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank()
,legend.position = "bottom"
, panel.spacing.y = unit(5, "lines")
)
g_genre_copy_plotly <- ggplotly(g_genre_copy) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_genre_copy_plotlyIn a second graph, we can see the number of titles for each genre, by year.
ds_genre_title <-
ds_genre %>%
filter(
measure == "title_count"
) %>%
group_by(yr) %>%
pivot_longer(
cols = -c(yr, measure), # adding genre column
names_to = "genre",
values_to = "value"
)
g_genre_title <- ds_genre_title %>%
ggplot(aes(x = genre, y = value, fill = genre)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
labs(
title = "Number of Titles by Genre and Year",
x = "Genre",
y = "Number of Titles",
fill = "Genre"
) +
theme_get() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank()
,legend.position = "bottom"
, panel.spacing.y = unit(5, "lines"),
)
g_genre_title_plotly <- ggplotly(g_genre_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_genre_title_plotlyrm(g_genre_copy, ds_genre_copy, ds_genre_title, g_genre_title)This table has 40 rows and 28 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). Starting from the third column, each column represents a separate geographical region.
region_groups <- list(
"Західна Україна" = c("Львівська", "Івано-Франківська", "Закарпатська", "Тернопільська", "Чернівецька", "Волинська", "Рівненська"),
"Центральна Україна" = c("Київська", "Житомирська", "Черкаська", "Кіровоградська", "Полтавська", "Вінницька"),
"Південна Україна" = c("Одеська", "Миколаївська", "Херсонська", "Запорізька"),
"Східна Україна" = c("Харківська", "Донецька", "Луганська", "Дніпропетровська"),
"Північна Україна" = c("Чернігівська", "Сумська")
, "Крим" = "Автономна Республіка Крим"
)
region_group_df <- tibble::tibble(
region_name = unlist(region_groups),
group = rep(names(region_groups), times = sapply(region_groups, length))
)
g_geography <- ds_geography %>%
pivot_longer(
cols = -c(yr, measure),
names_to = "region_name",
values_to = "value"
) %>%
left_join(region_group_df, by = "region_name") %>%
select(yr, measure, region_name, group, everything())This graph describes the number of copies for each region in Central Ukraine
g_geography_central_copies <- g_geography %>%
filter(measure == "copy_count", group == "Центральна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 750, by = 250)) +
coord_cartesian(ylim = c(0, 750)) +
labs(
title = "Number of Copies by Region (Central Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank()
, panel.spacing.y = unit(6, "lines")
, panel.spacing.x = unit(2, "lines")
)
g_geography_central_copies_plotly <- ggplotly(g_geography_central_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_central_copies_plotlyrm(g_geography_central_copies)This graph describes the number of copies for each region in North Ukraine
g_geography_pivnichna_copies <- g_geography %>%
filter(measure == "copy_count", group == "Північна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 500, by = 250)) +
coord_cartesian(ylim = c(0, 500)) +
labs(
title = "Number of Copies by Region (North Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_pivnichna_copies_plotly <- ggplotly(g_geography_pivnichna_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_pivnichna_copies_plotlyrm(g_geography_pivnichna_copies)This graph describes the number of copies for each region in South Ukraine
g_geography_pivdena_copies <- g_geography %>%
filter(measure == "copy_count", group == "Південна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 500, by = 250)) +
coord_cartesian(ylim = c(0, 500)) +
labs(
title = "Number of Copies by Region (South Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_pivdena_copies_plotly <- ggplotly(g_geography_pivdena_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_pivdena_copies_plotlyrm(g_geography_pivdena_copies)This graph describes the number of copies for each region in West Ukraine
g_geography_zahidna_copies <- g_geography %>%
filter(measure == "copy_count", group == "Західна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 1250, by = 500)) +
coord_cartesian(ylim = c(0, 1250)) +
labs(
title = "Number of Copies by Region (West Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_zahidna_copies_plotly <- ggplotly(g_geography_zahidna_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_zahidna_copies_plotlyrm(g_geography_zahidna_copies)This graph describes the number of copies for each region in East Ukraine
g_geography_shidna_copies <- g_geography %>%
filter(measure == "copy_count", group == "Східна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 3500, by = 1000)) +
coord_cartesian(ylim = c(0, 3500)) +
labs(
title = "Number of Copies by Region (East Ukraine) and Year",
x = "Region",
y = "Number of Copies (ths.)",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_shidna_copies_plotly <- ggplotly(g_geography_shidna_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_shidna_copies_plotlyrm(g_geography_shidna_copies)This graph describes the number of copies for Crimea
g_geography_krym_copies <-
g_geography %>%
filter(
measure == "copy_count",
group == "Крим"
) %>%
ggplot(aes(x = yr, y = value, group = 1)) + # add group=1 for geom_line
geom_point() +
geom_line() +
labs(
title = "Number of Copies by Region (Krym) and Year",
x = "Year",
y = "Number of Copies (ths.)"
) +
scale_y_continuous(breaks = seq(0, 1500, by = 250), limits = c(0, 1500)) +
scale_x_continuous(breaks = 2005:2024) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
legend.position = "none" # No legend needed, since only one line/region
)
g_geography_krym_copies_plotly <- ggplotly(g_geography_krym_copies)
g_geography_krym_copies_plotlyrm(g_geography_krym_copies)This graph describes the number of titles for each region in Central Ukraine
g_geography_central_title <- g_geography %>%
filter(measure == "title_count", group == "Центральна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 750, by = 250)) +
coord_cartesian(ylim = c(0, 750)) +
labs(
title = "Number of Titles by Region (Central Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_central_title_plotly <- ggplotly(g_geography_central_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_central_title_plotlyrm(g_geography_central_title)This graph describes the number of titles for each region in North Ukraine
g_geography_pivnichna_title <- g_geography %>%
filter(measure == "title_count", group == "Північна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 750, by = 250)) +
coord_cartesian(ylim = c(0, 750)) +
labs(
title = "Number of Titles by Region (North Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_pivnichna_title_plotly <- ggplotly(g_geography_pivnichna_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_pivnichna_title_plotlyrm(g_geography_pivnichna_title)This graph describes the number of titles for each region in South Ukraine
g_geography_pivdena_title <- g_geography %>%
filter(measure == "title_count", group == "Південна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 750, by = 250)) +
coord_cartesian(ylim = c(0, 750)) +
labs(
title = "Number of Titles by Region (South Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_pivdena_title_plotly <- ggplotly(g_geography_pivdena_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_pivdena_title_plotlyrm(g_geography_pivdena_title)This graph describes the number of titles for each region in West Ukraine
g_geography_zahidna_title <- g_geography %>%
filter(measure == "title_count", group == "Західна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 1250, by = 500)) +
coord_cartesian(ylim = c(0, 1250)) +
labs(
title = "Number of Titles by Region (West Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_zahidna_title_plotly <- ggplotly(g_geography_zahidna_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_zahidna_title_plotlyrm(g_geography_zahidna_title)This graph describes the number of titles for each region in East Ukraine
g_geography_shidna_title <- g_geography %>%
filter(measure == "title_count", group == "Східна Україна") %>%
complete(yr, region_name, fill = list(value = 0)) %>%
ggplot(aes(x = region_name, y = value, fill = region_name)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 4000, by = 1000)) +
coord_cartesian(ylim = c(0, 4000)) +
labs(
title = "Number of Titles by Region (East Ukraine) and Year",
x = "Region",
y = "Number of Titles",
fill = "Region"
) +
theme_minimal() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
panel.spacing.y = unit(6, "lines"),
panel.spacing.x = unit(2, "lines")
)
g_geography_shidna_title_plotly <- ggplotly(g_geography_shidna_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_geography_shidna_title_plotlyrm(g_geography_shidna_title)This graph describes the number of titles for Crimea
g_geography_krym_title <-
g_geography %>%
filter(
measure == "title_count",
group == "Крим"
) %>%
ggplot(aes(x = yr, y = value, group = 1)) + # group=1 for line plot
geom_point() +
geom_line() +
labs(
title = "Number of Titles by Region (Krym) and Year",
x = "Year",
y = "Number of Titles"
) +
scale_y_continuous(breaks = seq(0, 1000, by = 250), limits = c(0, 1000)) +
scale_x_continuous(breaks = 2005:2024) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
legend.position = "none" # no legend needed
)
g_geography_krym_title_plotly <- ggplotly(g_geography_krym_title)
g_geography_krym_title_plotlyrm(g_geography_krym_title)This table has 14 rows and 39 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). Starting from the third column, each column represents a new language. Data is available only for the years 2018–2024.
To analyze this table, we will examine each year separately. Each graph displays only those languages that have at least 10 different titles and 10,000 copies.
g_language <- ds_language %>%
pivot_longer(
cols = -c(yr, measure),
names_to = "language",
values_to = "value"
)g_language_2018 <-
g_language %>%
filter(
yr == 2018,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +
labs(
title = "Number of Copies and Titles (2018)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom"
, axis.text.x = element_text(size = 8)
)
g_language_2018_plotly <- ggplotly(g_language_2018) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2018_plotlyrm(g_language_2018)g_language_2019 <-
g_language %>%
filter(
yr == 2019,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Rename languages as needed
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 55000, by = 10000), limits = c(0, 55000)) +
labs(
title = "Number of Copies and Titles (2019)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2019_plotly <- ggplotly(g_language_2019) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2019_plotlyrm(g_language_2019)g_language_2020 <-
g_language %>%
filter(
yr == 2020,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Rename language as needed:
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +
labs(
title = "Number of Copies and Titles (2020)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2020_plotly <- ggplotly(g_language_2020) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2020_plotlyrm(g_language_2020)g_language_2021 <-
g_language %>%
filter(
yr == 2021,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Rename language as needed:
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 40000, by = 5000), limits = c(0, 40000)) +
labs(
title = "Number of Copies and Titles (2021)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2021_plotly <- ggplotly(g_language_2021) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2021_plotlyrm(g_language_2021)g_language_2022 <-
g_language %>%
filter(
yr == 2022,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Optional: rename any language here
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 15000, by = 5000), limits = c(0, 15000)) +
labs(
title = "Number of Copies and Titles (2022)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2022_plotly <- ggplotly(g_language_2022) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2022_plotlyrm(g_language_2022)g_language_2023 <-
g_language %>%
filter(
yr == 2023,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Optional: rename any language as needed
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 25000, by = 5000), limits = c(0, 25000)) +
labs(
title = "Number of Copies and Titles (2023)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2023_plotly <- ggplotly(g_language_2023) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2023_plotlyrm(g_language_2023)g_language_2024 <-
g_language %>%
filter(
yr == 2024,
(measure == "copy_count" & value > 10) | (measure == "title_count" & value > 10)
) %>%
mutate(
measure = recode(measure,
copy_count = "Copies",
title_count = "Titles"),
# Optional: rename any language as needed
language = recode(language,
`Кількома мовами народів світу` = "Декілька")
) %>%
ggplot(aes(x = language, y = value, fill = measure)) +
geom_col(position = "dodge") +
scale_y_continuous(breaks = seq(0, 35000, by = 5000), limits = c(0, 35000)) +
labs(
title = "Number of Copies and Titles (2024)",
x = "Language",
y = "Count",
fill = "Measure"
) +
theme_minimal() +
theme(
legend.position = "bottom",
axis.text.x = element_text(size = 8)
)
g_language_2024_plotly <- ggplotly(g_language_2024) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_language_2024_plotlyrm(g_language_2024)This table has 40 rows and 16 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). Starting from the third column, each column represents a different publication type.
The first graph shows the number of copies for each publication type, by year.
ds_pubtype_l <- ds_pubtype %>%
pivot_longer(
cols = -c(yr, measure),
names_to = "pubtype",
values_to = "value"
)
ds_pubtype_copy <-
ds_pubtype_l %>%
filter(measure == "copy_count") %>%
group_by(yr)
g_pubtype_copy <- ds_pubtype_copy %>%
ggplot(aes(x = pubtype, y = value, fill = pubtype)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 30000, by = 7500), limits = c(0, 37000)) +
coord_cartesian(ylim = c(0, 37000)) +
labs(
title = "Number of Copies by Publication Type and Year",
x = "Publication Type",
y = "Number of Copies (ths.)",
fill = "Publication Type"
) +
theme_get() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank()
,legend.position = "bottom"
, panel.spacing.y = unit(5, "lines")
)
g_pubtype_copy_plotly <- ggplotly(g_pubtype_copy) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_pubtype_copy_plotlyrm(ds_pubtype_l, ds_pubtype_copy, g_pubtype_copy)Second graph describes the number of titles of each publication type, by year.
ds_pubtype_l <- ds_pubtype %>%
pivot_longer(
cols = -c(yr, measure),
names_to = "pubtype",
values_to = "value"
)
ds_pubtype_title <-
ds_pubtype_l %>%
filter(measure == "title_count") %>%
group_by(yr)
g_pubtype_title <- ds_pubtype_title %>%
ggplot(aes(x = pubtype, y = value, fill = pubtype)) +
geom_col() +
facet_wrap(~ yr, ncol = 4) +
scale_y_continuous(breaks = seq(0, 9000, by = 2500), limits = c(0, 9000)) +
coord_cartesian(ylim = c(0, 9000)) +
labs(
title = "Number of Titles by Publication Type and Year",
x = "Publication Type",
y = "Number of Titles",
fill = "Publication Type"
) +
theme_get() +
theme(
axis.text.x = element_blank(),
axis.title.x = element_blank(),
legend.position = "bottom",
panel.spacing.y = unit(5, "lines")
)
g_pubtype_title_plotly <- ggplotly(g_pubtype_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.1,
xanchor = "center"
)
)
g_pubtype_title_plotlyrm(ds_pubtype_title, g_pubtype_title)This table has 40 rows and 16 columns. In the measure column, there are only two possible values: title_count (the number of titles) and copy_count (the number of copies). The third column shows the percentage of Ukrainian books among Ukrainian and Russian editions. The fourth column shows the percentage of Russian books among Ukrainian and Russian editions.
The first graph shows the number of copies of Ukrainian and Russian books for each year from 2005 to 2024.
g_ukrrus_copies <-
ds_ukr_rus %>%
filter(measure == "copy_count") %>%
select(yr, ukr, rus) %>%
pivot_longer(
cols = c(ukr, rus),
names_to = "language",
values_to = "value"
) %>%
ggplot(aes(x = factor(yr), y = value, fill = language)) +
geom_col(position = "dodge") +
scale_x_discrete(expand = expansion(mult = c(0))) +
labs(
title = "Number of Copies: Ukrainian vs Russian by Year",
x = "Year",
y = "Number of Copies (ths.)",
fill = "Language"
) +
theme_minimal() +
theme(
legend.position = "bottom"
)
g_ukrrus_copies_plotly <- ggplotly(g_ukrrus_copies) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_ukrrus_copies_plotlyrm(g_ukrrus_copies)Second graph describes the number of titles of Ukrainian and russian titles for each year since 2005 to 2024.
g_ukrrus_title <-
ds_ukr_rus %>%
filter(measure == "title_count") %>%
select(yr, ukr, rus) %>%
pivot_longer(
cols = c(ukr, rus),
names_to = "language",
values_to = "value"
) %>%
ggplot(aes(x = factor(yr), y = value, fill = language)) +
geom_col(position = "dodge") +
scale_x_discrete(expand = expansion(mult = c(0))) +
scale_y_continuous(breaks = seq(0, 20000, by = 5000), limits = c(0, 20000)) +
labs(
title = "Number of Titles: Ukrainian vs Russian by Year",
x = "Year",
y = "Number of Titles",
fill = "Language"
) +
theme_minimal() +
theme(
legend.position = "bottom"
)
g_ukrrus_title_plotly <- ggplotly(g_ukrrus_title) %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
y = -0.15,
xanchor = "center"
)
)
g_ukrrus_title_plotlyrm(g_ukrrus_title)This document was created to help you gain a deeper understanding of the data. I hope you found it useful! 😉